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Abstract. Iterative methods that operate with the full Hamiltonian matrix in 
the untrimmed Hilbert space of a finite system continue to be important tools 
,!_> for the study of one and two dimensional quantum spin models, in particular 

O in the presence of frustration. To reach sensible system sizes such numerical 

j calculations heavily depend on the use of symmetries. We describe a divide-and- 

conquer strategy for implementing translation symmetries of finite spin clusters, 
which efficiently uses and extends the "sublattice coding" of H.Q. Lin 1 . With 
this method the Hamiltonian matrix can be generated on-the-fly in each matrix 
vector multiplication and problem dimensions beyond 10 11 become accessible. 

i 

& 1. Introduction 

Lattice spin models have attracted continuous research activity, from the early 
i-^j days of quantum mechanics until the present. Unfortunately, only very few spin 

£3 models can be solved analytically and in the thermodynamic limit, with geome- 

tries usually restricted to one dimension. [2j [3] Numeric methods, therefore, are 
i ^ i indispensable for an understanding of quantum spin models. Density matrix renor- 

malisation [4J and related variational approaches have revolutionised the study 
one-dimensional systems and are able to deliver very precise results for eigenstates 
and correlation functions. However, two-dimensional spin systems, systems with 
^— ^ frustrated interactions [5] and dynamic correlations in such systems are still a domain 

£ — , for "exact" iterative methods, which rely on the construction of the full Hamiltonian 

i-H matrix for finite clusters, thereby making use of the inherent symmetries. 

In this work we consider spin models on finite lattices with periodic boundary 
y—i conditions and describe an efficient approach for the construction of a translation 

O^l symmetric basis of the corresponding Hilbert space. The underlying ideas are related 

to divide-and-conquer strategies and fast Fourier transform. The core decomposition 
trick that we use was invented more than two decades ago by H.Q. Lin [T], but 
for unknown reasons did not really catch on. For many years the Hilbert space 
dimensions reached in studies of quantum spin models therefore lagged behind, 
C$ compared to simulations of Hubbard- type models or electron-phonon models. 



X 



2. Standard approach 

Let us start with a short review of the common method for the construction 
of translation symmetric spin states and the performance issues connected to it. 
Consider a quantum spin model with translation symmetry on a one-dimensional 
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lattice with n sites and periodic boundary conditions, s n = sq, 

The Hamiltonian H commutes with the total spin and its components S a = Y^hZq s f > 
a e {x, y, z}, and with the translation operator 

T : Si -> s j+ i . (2) 

Assuming local spins with amplitude \si\ = 1/2, the Hilbert space of the n-site 
system is the product of n two-dimensional spaces and has dimension 2™. Using the 
conservation of S z this space can be decomposed into n + 1 subspaces corresponding 
to the eigenvalues of S z , —n/2, —n/2 + 1, . . . , n/2. Fixing u = S z + n/2, a subspace 
is spanned by all products of u up-spins and n — u down-spins (the two eigenstates of 
s z ) and its dimension is (™), where obviously X^™=o (") = On a computer these 
states are usually represented as bit patterns of length n and the above decomposition 
is equivalent to grouping patterns according to their digit sum. Understanding bit 
patterns as integers defines an order and allows for the construction of ordered lists, 
which can be efficiently searched for specific patterns. 

The above Hamiltonian also conserves the amplitude of the total spin, S 2 , but the 
construction of the corresponding eigenstates is more involved and rarely adopted 
in numeric computations on finite clusters. Instead, lattice symmetries are used to 
further decompose spaces of given S z into smaller subspaces. As the title implies, 
in this work we focus on the translation symmetry. 

Since the Hamiltonian H commutes with the translation T, the matrix elements 
of H between different eigenspaces of T vanish. Thus, if H is expressed in an 
orthonormal basis of eigenstates of T, the original problem is split into n independent 
pieces each having a dimension roughly a factor of n smaller. The projection operator 

n— 1 

P k = - e 2 * ijk/n T j (3) 

maps an arbitrary state onto an eigenstate of T with eigenvalue exp(— 2irik/n), 
namely 

n— 1 

TP k \ij) = - V e^/^'+V) = cxp(-2^fc/n) P k \^) . (4) 
n * — ' 

Here we used T n = 1 which also implies exp(— 2-7rifc) = 1 and k e Z. Due to 
the periodicity there are only n distinct projectors and it suffices to consider the 
momenta k = 0, 1, . . . , (n—1). 

The projector P k maps states, which are related to each other by an arbitrary 
translation, \<p) — Ti\ij)), onto the same eigenstate of T (up to a phase factor). To 
avoid this ambiguity and obtain a symmetrised basis we therefore need to partition 
the set of all S z eigenstates, S, into orbits, i.e., disjoint subsets S r that are closed 
under the translation group, 

V|V) G S r : T^) e S r . (5) 

Each orbit can be represented by one of its elements. For instance, if we use bit 
patterns to represent S z eigenstates, we can sort all elements of an orbit by the 
corresponding integer values and choose the smallest as the representative of the orbit. 
All other members of this orbit are obtained from the representative by applying all 
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Figure 1 . Decomposition of the set of S z eigenstates with n = 4 
into orbits. Arrows indicate the action of the translation T. The 6 
representatives are shown in bold face. 

translations T 3 , j = 1, . . . , (n — 1). Figure [l] illustrates this decomposition for the 
case n — 4, where S contains 2 4 = 16 elements which belong to 6 disjoint orbits. 
The sizes of the orbits differ, since some of the bit patterns are invariant under 
non-trivial subgroups of the full translation group of the n-site lattice. The orbit 
size is then given by n divided by the order of the subgroup. 

Let us denote the set of all representatives with 7Z and the size of the orbit of 
|r) G 7Z with uj r . Then, for given momentum k a translation symmetric basis of the 
Hilbert space is formed by the states 



The prefactor y/Vk,r ensures that \k,r) is normalised, (k, r\k, r) = 1, or it is zero, 
which means that for momentum k the representative |r) does not contribute to the 
basis. More precisely, 



Of course, we can restrict the set 1Z to states with a given eigenvalue of S z (i.e., pat- 
terns with given digit sum), which yields a basis that makes use of both symmetries 
of H, S z and T. 



The recipe for the construction of the symmetrised basis {\k, r}} does not look 
particularly complicated. However, following it becomes very time-consuming for 
large n. To find all representatives 71 we can loop over the 2™ eigenstates of S z in 
S, apply all n translations and check, if the considered bit pattern has the minimal 
integer value within its orbit. If this is the case, it qualifies for the set 1Z. The 
process can be improved by memorising bit patterns that were already encountered 



\k,r) = Pk\r)> \r)en. 



(6) 




3. The Problem 
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n dimension 

l!6 252 088 496 

38 930138 522 

40 3 446167860 

42 12 815 663 844 

44 47820 447028 

46 178 987 624 514 

Table 1. Problem dimensions for S z = 0, k = as a function of 
the chain length n. 



in previous orbits, and apply the translations only to new ones. Still, we need to 
perform of the order of 2™ translations and store all the patterns. 

The performance issues become more serious, once we start to calculate matrix 
elements of the Hamiltonian H with respect to the basis {|fc, r)}, 

(k,r'\H\k,r) = y/v k ,r>vk,r (r'\P k HP k \r) = ^v k ,,v Kr (r'\P k H\r) . (8) 

The application of H on a representative \r) yields many different bit patterns — 
usually their number is some multiple of the lattice size n. In general, these bit 
patterns are not representatives, and we need to find the orbit they belong to 
as well as the translation, which maps the pattern to the representative of its 
orbit. The latter tells us which part of the projector P k contributes to the matrix 
element, in particular, which phase factor. If we apply all translations to all bit 
patterns generated by H and then look up the observed representatives in a list, 
the construction of the (sparse) matrix representation of H requires huge amounts 
of bit operations and processing time. 

In contrast, for lattice models such as the Hubbard model or electron-phonon 
models the Hilbert space is the product of subspaces which can be symmetrised 
individually. The subspaces are small enough such that orbit representatives can 
be identified through simple table look-ups, and it is common practise to construct 
the Hamiltonian matrix on-the-fly in each step of an iterative calculation. Methods 
such as the Lanczos eigenvalue solver [6 then need memory only for a few vectors 
with the dimension of the Hilbert space, and huge problems can be studied. 

With the standard approach for quantum spin models this is impractical. Instead, 
the matrix has to be kept in memory or stored on disk. The former limits the 
accessible system sizes, and the latter is not efficient either, since disk access is slow 
and the matrix dimensions are huge (cf. Table [TJ. 

The program SPINPACK [7 ], which is frequently used to calculate the lowest 
eigenstates of spin models, follows the above strategies, and the problem of identifying 
the orbit and representative for a given bit pattern seriously limits its performance. 
The authors of the code even considered to implement the required bit operations 
with specialised hardware based on field-programmable gate arrays (FPGA). [5] 

As we will see below, this is not necessary once we have solved the following 
problem: Find a fast and memory efficient algorithm, which for a given arbitrary 
bit pattern identifies the orbit the pattern belongs to and the translation that maps it 
to the representative of this orbit. 
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Figure 2. The action of a translation on a lattice decomposed 
into two intertwined sublattices. 



4. Divide- and-conquer approach 

4.1. The basic idea. Let us assume that the number of lattice sites n is even. We 
can then divide the set of all sites into two subsets of equal size, such that the 
neighbours of a given site all belong to the other sublattice. The translation T of 
the entire lattice is then decomposed into two operations: the exchange of the two 
sublattices and a translation within one of the two. In Figure [2] we illustrate this 
concept for a lattice of n — 8 sites. 

Such a decomposition — termed "sublattice coding" — was used by H.Q. Lin [T] to 
construct the orbit representatives in exact diagonalisation studies of translation 
symmetric spin clusters with up to 32 sites. Unfortunately the discription of his 
algorithm is rather brief and details of the implementation remain vague. Therefore, 
the potential of this trick seems to have been missed (see, e.g., the discussion of 
Ref. [T] in Refs. [SI [TO])- In what follows we explain our interpretation and extension 
of the approach, which has been used for a couple of years and efficiently handles 
very large spin systems. 

We can formalise the above decomposition of lattice translations by introduc- 
ing the "zipper product" of two bit patterns |o) = (ao, . . . , a ra /2-i) and \b) = 
(bo, • ■ • , 6 n /2-i)j 

(a , . . . , o„/ 2 -i) © (&o> • • • j K/2-1) ■= ( a o, b ,.. . , a n/2 _i, 6„/2-i) • (9) 

For clarity, we indicate the size of the translated lattice as an index to the translation 
operator, T n , and assume that it translates patterns to the right, 



T n (ao, . . . , a„_i) — (a n _i, ao, . . . , a n -2) 



(10) 
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Then, the above procedure can be summarised as 

T n (\a)®\b)) = (T n/2 \b))®\a). (11) 

This equation relates the translations on the n-site lattice to the translations on the 
n/2-site sublattice, and we can use it to derive orbits and representatives of the full 
lattice from the orbits and representatives of the sublattice. Multiple application of 
T n on a state \a) ® \b) yields 



TV(\a)®\b)) = (Ti /2 \a))®(Ti /2 \b)), 
T^(\a)®\b)) = (T^\b))®(Tl /2 \a)) 



(12) 



with j = 0, . . . , (n/2 — 1), which illustrates how the orbit of \a) ® \b) is built from 
the sublattice orbits of \a) and \b). 

Let us now consider two representatives of the sublattice \r), \r') € Ti n /2 subject 
to two conditions: First, r < r' , where the order is defined in terms of the integer 
value of the bit patterns. Second, the orbits of both representatives have maximal 
size n/2, i.e., T^Ji/j) £ \ip) Vj = 1, . . . , (n/2 - 1) and \tp) € {\r), \r')}. Then, the 
n/2 states 

\r, r', j) = \r) ® (T^ /2 |r'}) with j = 0, . . . , (n/2 - 1) (13) 
are representatives for orbits of the full translation group on the n-site lattice. Since 



the translation T n involves the exchange of the two sublattices (see Eq. Ill, the 
orbit of \r,r',j), which is given by T^\r,r',j) with i = 0, . . . , (ra — 1), contains all 
n 2 /2 states that can be obtained by combining the sublattice orbits of \r) and \r'), 
namely (2* /3 |r)) © [T l n/2 \r')) and (T l n/2 \r')) ® (T^r)), with j,l = 0, . . . , (n/2-1). 
This explains the above condition r < r' . 

For r = r' the range of j needs to be reduced, 

\r, r,j) = \r) ® (2* /2 |r» with j = 0, .. . , [(n - 1)/4J , (14) 

as otherwise we would count states twice. Note, that for n/2 odd one of the generated 
representatives is invariant under Tn^ 2 , i.e., the orbit has size n/2 only. 

The general case, where r < r' and |r) or \r') are invariant under non-trivial 
subgroups of the n/2-site translation group, leads to further restrictions on the 
values of j. In addition, the generated representatives \r, r',j) will be invariant 
under subgroups of the n-site translation group. It is then convenient to identify 
all subgroups of the n/2-site translations (which correspond to the divisors of n/2), 
and to tabulate all possible combinations and the resulting restrictions on j. These 
tables are small and can also hold other basic details, like the orbit size u) rr ij 
of |r, r',j), which does not depend on r and r' directly, but only on the maximal 
subgroups |r) and \r') are invariant under. 

What are the advantages of the decomposition into two sublattices? First, the 
number of representatives of the sublattice is approximately equal to the square 
root of the number of representatives of the full lattice, \R- n /2\ ~ \f\R-n\- Hence, 
the construction of lZ n from lZ n /2 is much faster than the traditional approach we 
described earlier. Second, \lZ n /2\ is small enough, that we can store in memory 
the map — > T^ 2 \r) from an arbitrary state \i>) <G S n /2 to its representative 
\r) G li-n/2 and the corresponding exponent j. Moreover, we can use these tables to 
directly identify the representative \r) £ lZ n and the exponent j for an arbitrary 
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state \tp) <E S n on the full lattice. Hence, we can solve the problem of Section [3] with 
a few look-ups in moderately sized tables (typically a few megabytes). 

4.2. Implementation. We start from a lattice with n/2 sites and determine all 
subgroups of the translation group generated by T n / 2 . They are given by the divisors 
of n/2, namely, if d \ n/2 then T^ 2 generates a subgroup. For example, setting 
n/2 = 4 we find three subgroups indexed with g, 

g d = ujg example 

0~ 0000 , ■. 

1 2 0101 ^ } 

2 4 0001 

which match the three different orbit types seen in Figure [T] and the corresponding 
orbit sizes co g . 

Next, we construct representatives for all orbits in S n / 2 - Since we are dealing 
with only half of the target lattice size n, we can use the approach sketched in the 
first paragraph of Section [3j For each representative \r) £ TZ n / 2 we also determine 
the maximal subgroup g it is invariant under, i.e., we find the minimal non-zero 
d | n/2 such that T^ 2 \r) — \r). For the example n/2 = 4 we obtain 

r \r) d g 



0000 1 

1 0001 4 2 



2 0011 4 2 (16) 

3 0101 2 1 

4 0111 4 2 

5 1111 1 

Having selected a set of representatives 1Z n / 2 , we can tabulate the map \tp) — > T^, 2 \r), 
i.e., we can construct an array which takes the integer value of a bit pattern as the 
index and returns both, the index r of the corresponding representative |r) and the 
exponent j. For the example n/2 = 4 this looks as follows: 
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Knowing all details about the sublattice with n/2 sites, we can construct a set of 
tables, which characterise the symmetrised states of the full n-site lattice. Consider 
an arbitrary state \a) © \b) on the full lattice: For both sublattice states, \a) and |6), 
we can immediately look up the indices r a and rj, of corresponding representatives 
\r a ) and |rfe), as well as the exponents j a and jb- In addition, given r a and Tb 
we know the subgroups g a and gb of the representatives \r a ) and |rb). The only 
information missing for locating |o) © \b) within its orbit, 



®\b)=T n \r,r',j) 



r')) 



(18) 



are the exponents i and j. These exponents depend only on the values of j a , jb, 
g a and gb, and on the order of r a and rj,, namely, whether r a < rj,, r a = rb or 
f a > rb- When we defined the representatives \r,r',j) of the full lattice in Eq. (131, 
we demanded r < r'. Thus, if we encounter r a > r^, then \a) © \b) is created 
from the representative \rb,f a ,j) by a translation with odd exponent i. For 
fa < fb the representative is \r a , rb,j) and the exponent i is even. For r a = n other 



restriction apply, as discussed in the paragraph of Eq. (14). We can tabulate all 
possible cases in three arrays, 



3a, 3b, 9a, gb -> i,3 , 
3a,3b,g -> i,j , 

ja, jb, 9a, gb -> i, j- 



(19) 



At first glance these four dimensional arrays appear large, but the j-indices take only 
n/2 different values, and the ^-indices even fewer. In the Appendices A.l to A. 3 we 
show the maps e < , e = and e > for the lattice with n = 8 sites. To build the arrays 



we perform a double- loop over the subgroups in Eq. (15), which fixes g a and gb- 
Then, for each combination of subgroups we pick two matching representatives, r a 



and Tb, and loop in reverse order over i and j in Eq. ( 18 ). Looking up the exponents 
j a and jb of the resulting state |o) @ \b) in Eq. (17) completes the data required for 



the arrays. In particular, for each input j a , jb, ga, gb the stored values of i, j are 
the minimal ones. 

In Figure [3] we summarise the algorithm to identify both, the orbit of an arbitrary 
state | a) © \b) and its translation relative to the orbit's representative. Let us 
remark that, in general, the representative |r, r',j) is not the state with minimal 
integer value within its orbit. Using direct table look-ups this property is no longer 
needed, and in our programs the components of zipped states are usually stored in 
separate variables. 

The tables e < and e = can also be used to identify the values of j, for which \r, r' , j) 
is a valid representative. In this case the set (j a = 0, jb = j,g a ,gb) is mapped to 
(i = 0,j), i.e., \r,r',j) is not part of the orbit of some other representative |r, r',j') 
with j' < j. We can store this information together with the size of the orbit of 
\r,r',j), which depends only on the corresponding subgroups g and g' . Similar to 



the previous arrays, we need to distinguish two cases, oj< g , ■ 



for r < r' and 



for 



If j is invalid, we set u> to zero, otherwise it will have some integer value 



d I n. In Appendix A. 4 we list the latter quantities for n = 8. 

In analogy to Eqs. ([6| and ([7]), we now know the normalised, translation-symmetric 
basis states of the n-site lattice, 



\k,r,r',j) = y/Vk,r,r>,3 Pk\r,r',j) , 



(20) 
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Figure 3. Schematic view of the table look-ups required to locate 
an arbitrary state |o) © \b) within its orbit. 



where 



9,g',j 



JO if n/ui divides k . . 

^fc.r.r' .j = \ . (21) 

I cj otherwise 

respectively. Note that for large n the number of representa- 



tives with cj < n is negligible compared to those with uj = n introduced in Eq. (13). 
Therefore, in a practical calculation a loop over the whole basis can include all 
r < r' and j = 0, . . ., (n/2 — 1), and the few inactive states with oj = will waste 
hardly any resources. 

In the preceding paragraphs we did not take into account the S z symmetry of 
the original spin model ([I]). However, its inclusion is easy: When constructing the 
representatives of the sublattice, TZ n /2, we also calculate the S z eigenvalue of each 
\r). Then, for the representatives of the full lattice, \r,r',j), we combine only those 
r and r', whose spin values add to the desired S z of the full lattice. This requires a 
little extra book keeping, but does not affect the overall performance. 



5. Generalisations 

5.1. Two-dimensional lattices. Up to now we considered only one-dimensional 
lattices, but the generalisation to two dimensions is straightforward. Again we 
demand n = n x x n y to be even. Hence, one or both of n x and n y are even, and we 
can apply the decomposition into sublattices along one of the two space directions. 
Another option is the decomposition into a chequerboard pattern, which can also be 
used for quadratic clusters with rotated unit cell [llj and an even number of sites 
fulfilling n = n\ + n| with nj, e Z. The main condition for the decomposition is 
that the sublattices each have the same translation group. In Figure [4] we show the 
lattice with 20 = 5 x 4 sites decomposed along the y-direction and the lattice with 
10 = 3 2 + l 2 sites decomposed in chequerboard fashion. 

The construction of representatives for the orbits of the full-lattice translation 
group then follows the route described in Section|4] Merely the number and structure 
of the subgroups of the translation group differs slightly, since now the group is 
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Figure 4. The decompositions of lattices with 20 = 5 x 4 and 
with 10 = 3 2 + l 2 sites into two sublattices. 



generated by two commuting elementary translations T x and T y . Also the condition 
for vanishing norm fk,r,r',j is more complicated and will usually be tabulated. 

5.2. Reflection symmetry. Apart from being translation symmetric most of the 
considered quantum spin models are also invariant under reflections, i.e., the full 
lattice symmetry is described by the dihedral group or, in higher dimensions, by 
products thereof. In one dimension the reflection operator reads 

R-.Si^f- s„-i-i • (22) 

It is fully compatible with the lattice decomposition introduced in Section |4j since 
R can be written as reflections of both sublattices and exchange of the two, 

R(\a)®\b)) = (R\b))®(R\a)). (23) 

Hence, the reflections can be incorporated into the divide-and-conquer approach 
and used for a further reduction of the Hilbert space dimension, or to make the 
matrix representation real [9j. 

5.3. Odd lattice size. The key prerequisite for the decompositions presented in 
the preceding sections, is the even number of lattice sites. In practise, this condition 
is not particularly restrictive, since many of the quantum spin models studied are 
anti-ferromagnetic. Fitting long range order or correlations of this type into a finite 
cluster usually requires even n. However, lattices with an odd number of sites could 
be of interest for certain interaction types, geometries or spin amplitudes other than 
one-half. 

As long as n is not prime, we can take a small factor m \ n and split the lattice 
into m equal sublattices, such that the left and the right neighbour of a site belongs 
to the previous and the next sublattice, respectively. An arbitrary translation of the 
full lattice then corresponds to a cyclic permutation of the sublattices and internal 
translations within the sublattices. 

Consider, for instance, a lattice where the number of sites is a multiple of 3. We 
can decompose this lattice into 3 sublattices, as illustrated in Figure [5] Now, the 
translation of the full lattice by a single site is equivalent to a cyclic permutation of 
the three sublattices and an internal translation within one of the three, or 

T n (\a) © \b) © |c» = (T n/3 \c)) ® | a) © \b) . (24) 

Knowing the representatives |r) for the sublattice with n/3 sites, we can build 
representatives for the full lattice, 

\r,r',r",j',j") = \r) ® (T{ >')) © (T*"\r»)) . (25) 
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FIGURE 5. The action of a translation on a lattice decomposed 
into three sublattices of equal size. 

The restrictions needed to avoid double counting are more intricate, compared to the 
bi-partition. There are 6 permutations of 3 objects, and the cycle (123) connects the 
even and the odd permutations among each other. Thus, starting from r < r' < r" 
or r > r' > r" we can reach all possible combinations of sublattice states. The 
exponents, in general, can take all values = 0, . . . , (n/3 — 1), but some will be 

switched off with appropriate norm factors, if two or all representatives are equal or 
have a higher symmetry. When we construct the corresponding tables of the orbit 
sizes <^ g ,g',g",j',j", we need to differentiate between a number of different cases. The 
three states r, r' and r" can all be different and arranged in ascending or descending 
order, there can be equal pairs, or all three can be the same. Similarly, the three 
tables e < , e = and e > , which for the bi-partite lattice were sufficient to identify the 
orbit and the phase factor of an arbitrary state on the full lattice, now generalise to 
a whole set of tables covering all possible orderings of r, r' and r". 

As yet we did not have a good incentive to study lattices with an odd number 
of sites and, therefore, cannot comment on the performance of this setup. An 
implementation of the decomposition into three sublattices appears feasible, but the 
benefits of decompositions into five or more sublattices seem to be rather limited. 

5.4. Higher spin. The translation symmetry of the lattice and the structure of 
the local Hilbert space at each site are more or less independent. Therefore, the 
construction of the translation symmetric basis can easily be extended to systems 
with spins of amplitude larger than 1/2. The efficient storage of the map from 
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Table 2. Lanczos calculations of the ground state of the Heisenberg 
model using one core of a desktop computer with Intel Xeon 5150 
processors at 2.66 GHz, many cores of a compute server with 8 
quad-core Opteron 8384 processors at 2.7 GHz, and several nodes 
of a high-performance cluster with Power6 processors at 4.7 GHz. 
The last column shows the time required for one matrix vector 
multiplication (MVM). 



sublattice states to sublattice representatives, \ip) — > T^, 2 \r), may require some care. 
Otherwise, all steps of the algorithm work as described above. 

6. Performance 

We implemented the divide-and-conquer approach for spin- 1/2 chains and rect- 
angular two-dimensional lattices (n — n x x n y ) already a few years ago, and used it 
mainly for the study of correlation functions. The latter can be efficiently calcu- 
lated using Chebyshev expansion methods, [T^l [T3] which at their core require fast 
matrix vector multiplications. For example, we calculated a set of static correlation 
functions [H] and the dynamic ESR-response [13 US] of the one-dimensional XXZ 
model at finite temperature and finite magnetic field. 

Of course, the described basis construction can also be used in Lanczos calculations 
of low-energy eigenstates. To give an impression of the performance of the algorithm, 
in Table [2] we show the time and memory consumption of several ground-state 
calculations for the Heisenberg model on one- and two-dimensional lattices. Taking 
into account the translation and the S z symmetries, the Hamilton matrix is computed 
on-the-fly in each iteration. For the momenta considered the matrix is real. Systems 
with up to 36 sites can be simulated on desktop computers or powerful laptops, 
as illustrated by the single-core data for an older Xeon CPU. For systems with 
up to 40 sites we use a compute server with eight quad-core CPUs, and on a 
decent high-performance cluster [T7] we are able to handle systems with 46 sites, 
corresponding to a matrix dimension of 1.8 x 10 11 . The main limiting parameter for 
these calculations is the memory required for two double vectors with the dimension 
of the Hilbert space. On the largest clusters currently available one could certainly 
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study systems with 50 sites, which requires approximately 40 TB of memory and is 
well below present limits. 

7. Conclusions 

We present an efficient algorithm to construct translation symmetric basis states 
for quantum spin models on finite lattices with periodic boundary conditions. The 
approach extends an old trick by H.Q. Lin [T] and employs a divide-and-conquer 
strategy, such that direct table look-ups can be used to map an arbitrary spin state 
to its orbit with respect to the translation group. The Hamiltonian matrix, which 
in iterative calculations like Lanczos or Chebyshev expansion needs to be applied 
repeatedly to a few quantum states, can then be constructed on-the-fly. This saves 
large amounts of memory or disk space, and considerably increases the system size 
accessible to these types of simulations. 

We thank Rechenzentrum Garching of the Max Planck Society for providing 
compute time on their high-performance clusters. 

Appendix A. Tables for n = 8 



A.l. The map e< : j a ,jb,g a ,9b 
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